Exploração inicial dos dados¶
Conjunto de Dados: Vendas de Veículos e Tendências de Mercado¶
O "Vehicle Sales and Market Trends Dataset" oferece uma coleção abrangente de informações relacionadas às transações de venda de diversos veículos.
Detalhes incluídos no conjunto de dados:¶
- Year: Ano de fabricação do veículo.
- Make: Marca do veículo (ex: Ford, Toyota, etc.).
- Model: Modelo do veículo.
- Trim: Versão ou acabamento específico do modelo.
- Body Type: Tipo de carroceria (ex: sedan, SUV, hatch).
- Transmission Type: Tipo de transmissão (manual, automática, etc.).
- VIN (Vehicle Identification Number): Número de identificação único do veículo.
- State of Registration: Estado onde o veículo foi registrado.
- Condition Rating: Classificação da condição do veículo (ex: excelente, bom, regular).
- Odometer Reading: Quilometragem registrada no hodômetro.
- Exterior Color: Cor externa do veículo.
- Interior Color: Cor interna do veículo.
- Seller Information: Informações sobre o vendedor do veículo.
- Manheim Market Report (MMR) Values: Valores estimados de mercado segundo o relatório MMR.
- Selling Prices: Preços reais de venda dos veículos.
- Sale Dates: Datas em que as vendas ocorreram.
Este conjunto de dados é ideal para análises de mercado automotivo, estudos de depreciação de veículos, avaliação de desempenho de vendas e muito mais.
URL: https://www.kaggle.com/datasets/syedanwarafridi/vehicle-sales-data/data
Load dos dados¶
In [102]:
import kagglehub
from kagglehub import KaggleDatasetAdapter
import os
import pandas as pd
import glob
import shutil
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
from datetime import datetime
import seaborn as sns
import chart_studio.plotly as py
import cufflinks as cf
import plotly.express as px
%matplotlib inline
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
cf.go_offline()
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
warnings.filterwarnings('ignore')
import plotly.graph_objects as go
In [103]:
# pastas
BASE_DIR = os.getcwd()
NOTE_DIR = os.path.dirname(BASE_DIR)
APP_DIR = os.path.dirname(BASE_DIR)
DATA_DIR = os.path.join(APP_DIR, 'data')
DATASET_DIR = os.path.join(DATA_DIR, 'vehicle-sales-data')
# Cria o diretório de dados se não existir
os.makedirs(DATA_DIR, exist_ok=True)
In [104]:
# Baixar a versão mais recente
dataset = kagglehub.dataset_download("syedanwarafridi/vehicle-sales-data")
In [105]:
# Lista os arquivos no conjunto de dados baixado
dataset_files = glob.glob(os.path.join(dataset, "*.csv"))
print("Files in the dataset:")
for file in dataset_files:
print(f" - {os.path.basename(file)}")
Files in the dataset: - car_prices.csv
In [106]:
# Copia os arquivos para o diretório de dados do projeto para acesso mais fácil
os.makedirs(DATASET_DIR, exist_ok=True)
for file in dataset_files:
dest_file = os.path.join(DATASET_DIR, os.path.basename(file))
shutil.copy(file, dest_file)
print(f"Copied {os.path.basename(file)} to {DATASET_DIR}")
Copied car_prices.csv to d:\TOTVS-ATLAS\APP\data\vehicle-sales-data
In [107]:
# Carrega o primeiro arquivo CSV (ajuste se houver vários arquivos)
if dataset_files:
df = pd.read_csv(dataset_files[0])
print(f"\nLoaded {os.path.basename(dataset_files[0])}")
print(f"Shape: {df.shape}")
# Exibe as primeiras linhas
print("\nPreview of the data:")
display(df.head())
else:
print("No CSV files found in the dataset")
Loaded car_prices.csv Shape: (558837, 16) Preview of the data:
| year | make | model | trim | body | transmission | vin | state | condition | odometer | color | interior | seller | mmr | sellingprice | saledate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | Kia | Sorento | LX | SUV | automatic | 5xyktca69fg566472 | ca | 5.0 | 16639.0 | white | black | kia motors america inc | 20500.0 | 21500.0 | Tue Dec 16 2014 12:30:00 GMT-0800 (PST) |
| 1 | 2015 | Kia | Sorento | LX | SUV | automatic | 5xyktca69fg561319 | ca | 5.0 | 9393.0 | white | beige | kia motors america inc | 20800.0 | 21500.0 | Tue Dec 16 2014 12:30:00 GMT-0800 (PST) |
| 2 | 2014 | BMW | 3 Series | 328i SULEV | Sedan | automatic | wba3c1c51ek116351 | ca | 45.0 | 1331.0 | gray | black | financial services remarketing (lease) | 31900.0 | 30000.0 | Thu Jan 15 2015 04:30:00 GMT-0800 (PST) |
| 3 | 2015 | Volvo | S60 | T5 | Sedan | automatic | yv1612tb4f1310987 | ca | 41.0 | 14282.0 | white | black | volvo na rep/world omni | 27500.0 | 27750.0 | Thu Jan 29 2015 04:30:00 GMT-0800 (PST) |
| 4 | 2014 | BMW | 6 Series Gran Coupe | 650i | Sedan | automatic | wba6b2c57ed129731 | ca | 43.0 | 2641.0 | gray | black | financial services remarketing (lease) | 66000.0 | 67000.0 | Thu Dec 18 2014 12:30:00 GMT-0800 (PST) |
Analises iniciais (exploração)¶
In [108]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 558837 entries, 0 to 558836 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 558837 non-null int64 1 make 548536 non-null object 2 model 548438 non-null object 3 trim 548186 non-null object 4 body 545642 non-null object 5 transmission 493485 non-null object 6 vin 558833 non-null object 7 state 558837 non-null object 8 condition 547017 non-null float64 9 odometer 558743 non-null float64 10 color 558088 non-null object 11 interior 558088 non-null object 12 seller 558837 non-null object 13 mmr 558799 non-null float64 14 sellingprice 558825 non-null float64 15 saledate 558825 non-null object dtypes: float64(4), int64(1), object(11) memory usage: 68.2+ MB
In [109]:
df.apply(lambda x: f"{x.dtype} {x.nunique()} {x.unique()}")
Out[109]:
year int64 34 [2015 2014 2013 2012 2011 2010 2009 2... make object 96 ['Kia' 'BMW' 'Volvo' 'Nissan' 'Chevr... model object 973 ['Sorento' '3 Series' 'S60' '6 Seri... trim object 1963 ['LX' '328i SULEV' 'T5' ... 'pure'... body object 87 ['SUV' 'Sedan' 'Convertible' 'Coupe'... transmission object 4 ['automatic' nan 'manual' 'Sedan' 'se... vin object 550297 ['5xyktca69fg566472' '5xyktca69f... state object 64 ['ca' 'tx' 'pa' 'mn' 'az' 'wi' 'tn' ... condition float64 41 [ 5. 45. 41. 43. 1. 34. 2. 42. 3... odometer float64 172278 [ 16639. 9393. 1331. ... 20... color object 46 ['white' 'gray' 'black' 'red' 'silve... interior object 17 ['black' 'beige' 'tan' '—' 'gray' 'b... seller object 14263 ['kia motors america inc' 'finan... mmr float64 1101 [ 20500. 20800. 31900. ... 1820... sellingprice float64 1887 [ 21500. 30000. 27750. ... 1690... saledate object 3766 ['Tue Dec 16 2014 12:30:00 GMT-080... dtype: object
In [110]:
df.describe()
Out[110]:
| year | condition | odometer | mmr | sellingprice | |
|---|---|---|---|---|---|
| count | 558837.000000 | 547017.000000 | 558743.000000 | 558799.000000 | 558825.000000 |
| mean | 2010.038927 | 30.672365 | 68320.017767 | 13769.377495 | 13611.358810 |
| std | 3.966864 | 13.402832 | 53398.542821 | 9679.967174 | 9749.501628 |
| min | 1982.000000 | 1.000000 | 1.000000 | 25.000000 | 1.000000 |
| 25% | 2007.000000 | 23.000000 | 28371.000000 | 7100.000000 | 6900.000000 |
| 50% | 2012.000000 | 35.000000 | 52254.000000 | 12250.000000 | 12100.000000 |
| 75% | 2013.000000 | 42.000000 | 99109.000000 | 18300.000000 | 18200.000000 |
| max | 2015.000000 | 49.000000 | 999999.000000 | 182000.000000 | 230000.000000 |
In [111]:
df.describe(include='O')
Out[111]:
| make | model | trim | body | transmission | vin | state | color | interior | seller | saledate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 548536 | 548438 | 548186 | 545642 | 493485 | 558833 | 558837 | 558088 | 558088 | 558837 | 558825 |
| unique | 96 | 973 | 1963 | 87 | 4 | 550297 | 64 | 46 | 17 | 14263 | 3766 |
| top | Ford | Altima | Base | Sedan | automatic | automatic | fl | black | black | nissan-infiniti lt | Tue Feb 10 2015 01:30:00 GMT-0800 (PST) |
| freq | 93554 | 19349 | 55817 | 199437 | 475915 | 22 | 82945 | 110970 | 244329 | 19693 | 5334 |
In [112]:
df.isna().sum()
Out[112]:
year 0 make 10301 model 10399 trim 10651 body 13195 transmission 65352 vin 4 state 0 condition 11820 odometer 94 color 749 interior 749 seller 0 mmr 38 sellingprice 12 saledate 12 dtype: int64
In [113]:
df.nunique()
Out[113]:
year 34 make 96 model 973 trim 1963 body 87 transmission 4 vin 550297 state 64 condition 41 odometer 172278 color 46 interior 17 seller 14263 mmr 1101 sellingprice 1887 saledate 3766 dtype: int64
Limpesa dos dados¶
In [114]:
df.drop(columns=['vin'], inplace=True)
In [115]:
df.dropna(inplace=True)
In [116]:
df.info()
<class 'pandas.core.frame.DataFrame'> Index: 472325 entries, 0 to 558836 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 472325 non-null int64 1 make 472325 non-null object 2 model 472325 non-null object 3 trim 472325 non-null object 4 body 472325 non-null object 5 transmission 472325 non-null object 6 state 472325 non-null object 7 condition 472325 non-null float64 8 odometer 472325 non-null float64 9 color 472325 non-null object 10 interior 472325 non-null object 11 seller 472325 non-null object 12 mmr 472325 non-null float64 13 sellingprice 472325 non-null float64 14 saledate 472325 non-null object dtypes: float64(4), int64(1), object(10) memory usage: 57.7+ MB
Análises Gráficas¶
In [117]:
df.columns.to_list()
Out[117]:
['year', 'make', 'model', 'trim', 'body', 'transmission', 'state', 'condition', 'odometer', 'color', 'interior', 'seller', 'mmr', 'sellingprice', 'saledate']
In [118]:
YEAR = df['year'].value_counts()
In [119]:
px.bar(YEAR,title= "Distribuição do ano de produção de todos os carros")
In [120]:
MODEL_LABEL = df['model'].values
In [121]:
MODEL_counts = df['model'].value_counts()
In [122]:
MODEL_count = MODEL_counts[MODEL_counts > 1000]
MODEL_count
Out[122]:
model
Altima 16346
Fusion 12116
F-150 11950
Camry 10986
Escape 10656
...
HHR 1040
Avalanche 1023
Frontier 1006
F-350 Super Duty 1005
MKX 1001
Name: count, Length: 115, dtype: int64
In [123]:
# Cria um gráfico de pizza com tamanho aumentado
fig = go.Figure(data=[go.Pie(labels=MODEL_LABEL, values=MODEL_count)])
# Define o título e aumenta o tamanho
fig.update_layout(
title_text='Distribuição de Modelos de Carros',
width=900, # Aumenta a largura do gráfico
height=700, # Aumenta a altura do gráfico
legend=dict( # Melhora a legenda
font=dict(size=10),
orientation="v" # Orientação vertical da legenda
)
)
# Adiciona opções para melhorar a visualização
fig.update_traces(
textposition='inside', # Texto dentro das fatias
textinfo='percent+label', # Mostra percentual e rótulo
hole=0.3 # Cria efeito "donut" (opcional)
)
# Exibir o gráfico
fig.show()
In [124]:
px.bar(MODEL_count,title= "Distribuição de ano de produção de todos os carros")
In [125]:
df.groupby("model")["year"].sum()
Out[125]:
model
1 Series 729953
1500 5712217
190-Class 5979
2 Series 64461
200 7478487
...
mdx 2010
tC 1288497
xA 42100
xB 870084
xD 341868
Name: year, Length: 768, dtype: int64
In [126]:
MAKE = df.make.value_counts()
In [127]:
px.bar(MAKE,title= "Distribuição de ano de produção de todos os carros")
In [128]:
MM= pd.crosstab(df.model,df["make"])
In [129]:
px.bar(MM,title= "Distribuição de MM todos os carros")
In [130]:
TRIM = df.trim.value_counts()
In [131]:
TRIM = TRIM[TRIM > 10]
TRIM
Out[131]:
trim
Base 48779
SE 39633
LX 18706
Limited 15506
LT 15273
...
SE S/C 11
L100 11
Autobiography 11
R320 CDI 11
L 35th Anniversary Edition 11
Name: count, Length: 966, dtype: int64
In [132]:
px.bar(TRIM,title= "Distribuição de TRIM de todos os carros", color_discrete_sequence=["red"])
In [133]:
TTM = pd.crosstab(df.trim,df["make"])
In [134]:
px.line(TTM,title= "Distribuição de TTM de todos os carros")
In [135]:
BODY = df['body'].value_counts()
In [136]:
px.bar(BODY,title= " Distribuição de BODY de todos os carros")
In [137]:
df.columns.to_list()
Out[137]:
['year', 'make', 'model', 'trim', 'body', 'transmission', 'state', 'condition', 'odometer', 'color', 'interior', 'seller', 'mmr', 'sellingprice', 'saledate']
In [138]:
transmission_LABEL = df['transmission'].values
transmission_LABEL
Out[138]:
array(['automatic', 'automatic', 'automatic', ..., 'automatic',
'automatic', 'automatic'], shape=(472325,), dtype=object)
In [139]:
transmission_counts= df['transmission'].value_counts()
transmission_counts
Out[139]:
transmission automatic 455963 manual 16362 Name: count, dtype: int64
In [140]:
px.bar(transmission_counts,title= "Distribution of BODY all cars,")
In [141]:
pd.crosstab(df["trim"],df["transmission"])
Out[141]:
| transmission | automatic | manual |
|---|---|---|
| trim | ||
| ! | 361 | 38 |
| (1999.5) XE | 4 | 1 |
| + | 1389 | 22 |
| 1 | 34 | 10 |
| 1.6 | 18 | 6 |
| ... | ... | ... |
| xDrive35d | 231 | 0 |
| xDrive35i | 523 | 0 |
| xDrive35i Premium | 244 | 0 |
| xDrive48i | 48 | 0 |
| xDrive50i | 244 | 0 |
1494 rows × 2 columns
In [142]:
YT = pd.crosstab(df["year"],df["transmission"])
In [143]:
px.line(YT,title= "Distribution of SY all cars")
In [144]:
plt.figure(figsize=(8, 6))
transmission_counts.plot(kind='pie')
plt.title('A Number of transmission counts ')
plt.show()
In [145]:
STATE = df['state'].value_counts()
In [146]:
STATE
Out[146]:
state fl 75243 ca 66213 tx 41657 ga 30939 pa 24117 nj 23003 il 21494 oh 20294 tn 19183 nc 18731 mo 15286 mi 14185 nv 11682 md 9788 va 9448 wi 9196 mn 8627 az 7612 wa 7012 co 6270 ma 6167 ny 4749 in 3879 sc 3754 ne 3685 pr 2427 la 1744 ms 1730 ut 1716 hi 1205 or 1047 nm 163 ok 54 al 25 Name: count, dtype: int64
In [147]:
ST = pd.crosstab(df["state"],df["transmission"])
In [148]:
px.line(ST,title= "Distribution of ST all cars")
In [149]:
SM = pd.crosstab(df["state"],df["make"]).sum()
In [150]:
px.bar(SM,title= "Distribution of SM all cars")
In [151]:
SB = pd.crosstab(df["state"],df["body"])
In [152]:
px.line(SB,title= "Distribution of SB all cars")
In [153]:
SY = pd.crosstab(df["state"],df["year"]).sum()
In [154]:
px.bar(SY,title= "Distribution of SY all cars")
In [155]:
df.columns.to_list()
Out[155]:
['year', 'make', 'model', 'trim', 'body', 'transmission', 'state', 'condition', 'odometer', 'color', 'interior', 'seller', 'mmr', 'sellingprice', 'saledate']
In [156]:
Condition= df['condition'].value_counts()
In [157]:
px.bar(Condition,title= "Distribution of SY all cars")
In [158]:
CY = pd.crosstab(df["condition"],df["year"]).sum()
In [159]:
px.line(CY,title= "Distribution of CY all cars")
In [ ]:
In [160]:
# Cria um gráfico de pizza
fig = go.Figure(data=[go.Pie(labels=df["year"], values=df["condition"])])
# Define o título
fig.update_layout(title_text='Gráfico de Pizza Exemplo')
# Exibir o gráfico
fig.show()
In [161]:
px.bar(CY,title= "Distribuição de SY de todos os carros")
In [162]:
CMM = pd.crosstab(df["condition"],df["make"])
In [163]:
px.line(CMM,title= "Distribuição de CMM de todos os carros")
In [164]:
px.bar(CMM,title= "Distribuição de CMM de todos os carros")
In [165]:
CM = pd.crosstab(df["condition"],df["model"]).sum()
In [166]:
px.line(CM,title= "Distribuição de CM de todos os carros")
In [167]:
CT = pd.crosstab(df["condition"],df["transmission"])
In [168]:
px.line(CT,title= "Distribuição de CT de todos os carros")
In [169]:
CS = pd.crosstab(df["condition"],df["state"])
In [170]:
CS
Out[170]:
| state | al | az | ca | co | fl | ga | hi | il | in | la | ... | or | pa | pr | sc | tn | tx | ut | va | wa | wi |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| condition | |||||||||||||||||||||
| 1.0 | 1 | 101 | 1294 | 24 | 433 | 138 | 4 | 133 | 110 | 71 | ... | 33 | 153 | 29 | 126 | 119 | 439 | 141 | 252 | 76 | 312 |
| 2.0 | 0 | 261 | 2851 | 126 | 2538 | 1184 | 25 | 680 | 263 | 62 | ... | 100 | 747 | 91 | 164 | 337 | 1411 | 119 | 625 | 301 | 399 |
| 3.0 | 0 | 131 | 1472 | 212 | 1554 | 353 | 43 | 1127 | 81 | 30 | ... | 20 | 811 | 77 | 134 | 144 | 674 | 29 | 158 | 125 | 108 |
| 4.0 | 0 | 197 | 1756 | 295 | 4325 | 616 | 45 | 2082 | 66 | 38 | ... | 28 | 1483 | 52 | 114 | 481 | 988 | 16 | 347 | 165 | 167 |
| 5.0 | 0 | 43 | 837 | 177 | 1386 | 553 | 15 | 622 | 47 | 18 | ... | 12 | 427 | 2 | 52 | 786 | 692 | 2 | 165 | 185 | 180 |
| 11.0 | 0 | 0 | 14 | 1 | 8 | 5 | 1 | 2 | 0 | 1 | ... | 0 | 3 | 0 | 0 | 2 | 18 | 1 | 2 | 2 | 3 |
| 12.0 | 0 | 0 | 19 | 2 | 7 | 5 | 0 | 2 | 1 | 1 | ... | 1 | 2 | 0 | 0 | 4 | 14 | 1 | 1 | 2 | 5 |
| 13.0 | 0 | 2 | 8 | 0 | 10 | 3 | 0 | 3 | 1 | 1 | ... | 1 | 3 | 0 | 0 | 0 | 18 | 0 | 3 | 0 | 2 |
| 14.0 | 0 | 1 | 15 | 1 | 20 | 5 | 1 | 6 | 0 | 1 | ... | 0 | 6 | 1 | 0 | 6 | 20 | 0 | 1 | 1 | 3 |
| 15.0 | 0 | 4 | 13 | 1 | 11 | 10 | 1 | 2 | 0 | 1 | ... | 1 | 1 | 0 | 2 | 5 | 27 | 1 | 4 | 3 | 6 |
| 16.0 | 0 | 5 | 17 | 1 | 15 | 8 | 2 | 1 | 2 | 2 | ... | 0 | 3 | 1 | 0 | 5 | 29 | 5 | 4 | 1 | 5 |
| 17.0 | 0 | 3 | 40 | 0 | 27 | 7 | 0 | 8 | 3 | 0 | ... | 0 | 2 | 3 | 4 | 6 | 40 | 2 | 5 | 2 | 3 |
| 18.0 | 0 | 5 | 47 | 1 | 25 | 17 | 4 | 6 | 5 | 1 | ... | 1 | 8 | 1 | 1 | 6 | 61 | 2 | 8 | 4 | 9 |
| 19.0 | 5 | 1041 | 4099 | 479 | 6791 | 2819 | 94 | 1508 | 238 | 194 | ... | 49 | 1076 | 119 | 250 | 888 | 3110 | 256 | 1400 | 281 | 916 |
| 21.0 | 1 | 176 | 1088 | 87 | 1289 | 588 | 23 | 304 | 60 | 36 | ... | 15 | 268 | 38 | 66 | 195 | 746 | 48 | 254 | 72 | 185 |
| 22.0 | 1 | 105 | 685 | 53 | 861 | 360 | 14 | 175 | 43 | 20 | ... | 13 | 206 | 23 | 49 | 128 | 486 | 27 | 184 | 59 | 124 |
| 23.0 | 1 | 127 | 945 | 74 | 1209 | 513 | 16 | 266 | 57 | 33 | ... | 21 | 285 | 41 | 48 | 175 | 664 | 36 | 219 | 92 | 143 |
| 24.0 | 1 | 167 | 1136 | 106 | 1249 | 609 | 24 | 323 | 55 | 37 | ... | 22 | 315 | 45 | 58 | 206 | 791 | 50 | 226 | 92 | 181 |
| 25.0 | 1 | 215 | 1527 | 135 | 1675 | 782 | 33 | 461 | 76 | 32 | ... | 17 | 447 | 66 | 115 | 290 | 1028 | 64 | 339 | 127 | 230 |
| 26.0 | 1 | 192 | 1553 | 107 | 1507 | 783 | 28 | 437 | 71 | 43 | ... | 18 | 411 | 57 | 73 | 259 | 1028 | 63 | 327 | 117 | 220 |
| 27.0 | 1 | 265 | 2000 | 205 | 1992 | 1063 | 35 | 528 | 106 | 61 | ... | 32 | 604 | 82 | 151 | 420 | 1357 | 82 | 449 | 182 | 276 |
| 28.0 | 0 | 306 | 2424 | 184 | 2188 | 1282 | 46 | 609 | 131 | 61 | ... | 33 | 750 | 93 | 160 | 495 | 1641 | 92 | 478 | 217 | 363 |
| 29.0 | 1 | 333 | 2447 | 193 | 2199 | 1165 | 50 | 668 | 119 | 63 | ... | 32 | 742 | 80 | 139 | 464 | 1634 | 86 | 475 | 217 | 357 |
| 31.0 | 1 | 155 | 1113 | 79 | 1093 | 603 | 14 | 299 | 60 | 26 | ... | 17 | 464 | 42 | 85 | 244 | 804 | 31 | 186 | 107 | 157 |
| 32.0 | 0 | 175 | 1312 | 99 | 1238 | 604 | 20 | 311 | 59 | 37 | ... | 20 | 511 | 75 | 77 | 281 | 811 | 29 | 213 | 106 | 172 |
| 33.0 | 2 | 189 | 1414 | 110 | 1322 | 648 | 21 | 377 | 58 | 31 | ... | 17 | 457 | 63 | 81 | 299 | 841 | 32 | 229 | 140 | 171 |
| 34.0 | 0 | 301 | 2233 | 168 | 2178 | 990 | 36 | 600 | 120 | 73 | ... | 41 | 812 | 104 | 148 | 504 | 1430 | 55 | 302 | 181 | 318 |
| 35.0 | 1 | 458 | 3514 | 247 | 3274 | 1539 | 68 | 919 | 166 | 115 | ... | 50 | 1236 | 170 | 198 | 855 | 2139 | 79 | 451 | 304 | 421 |
| 36.0 | 3 | 378 | 3041 | 269 | 2924 | 1292 | 38 | 710 | 156 | 104 | ... | 53 | 1113 | 157 | 184 | 766 | 1920 | 78 | 336 | 294 | 393 |
| 37.0 | 0 | 455 | 3435 | 308 | 3288 | 1439 | 54 | 779 | 157 | 96 | ... | 64 | 1305 | 147 | 188 | 877 | 2241 | 88 | 332 | 358 | 373 |
| 38.0 | 1 | 340 | 2647 | 223 | 2426 | 1007 | 43 | 574 | 144 | 67 | ... | 26 | 871 | 100 | 131 | 662 | 1558 | 50 | 220 | 300 | 284 |
| 39.0 | 0 | 279 | 2699 | 236 | 2562 | 1042 | 61 | 641 | 156 | 70 | ... | 55 | 885 | 110 | 123 | 771 | 1719 | 35 | 213 | 310 | 316 |
| 41.0 | 1 | 285 | 3113 | 257 | 3191 | 1220 | 50 | 676 | 151 | 59 | ... | 50 | 1233 | 102 | 144 | 925 | 1749 | 27 | 211 | 334 | 328 |
| 42.0 | 0 | 272 | 3105 | 312 | 3514 | 1266 | 56 | 823 | 182 | 66 | ... | 51 | 1245 | 102 | 141 | 997 | 1858 | 19 | 193 | 357 | 370 |
| 43.0 | 1 | 222 | 3190 | 292 | 3702 | 1363 | 58 | 852 | 213 | 45 | ... | 40 | 1179 | 93 | 133 | 1170 | 1897 | 21 | 201 | 380 | 361 |
| 44.0 | 1 | 171 | 2976 | 361 | 3796 | 1453 | 68 | 967 | 204 | 47 | ... | 36 | 1217 | 95 | 126 | 1423 | 1811 | 14 | 159 | 455 | 383 |
| 45.0 | 0 | 68 | 1375 | 154 | 1996 | 659 | 27 | 476 | 111 | 18 | ... | 13 | 556 | 44 | 63 | 707 | 870 | 9 | 55 | 211 | 179 |
| 46.0 | 0 | 63 | 1373 | 169 | 1892 | 717 | 32 | 668 | 109 | 16 | ... | 13 | 576 | 44 | 60 | 767 | 790 | 7 | 66 | 227 | 203 |
| 47.0 | 0 | 49 | 1132 | 143 | 1816 | 687 | 21 | 496 | 80 | 21 | ... | 15 | 555 | 28 | 59 | 729 | 698 | 6 | 47 | 181 | 175 |
| 48.0 | 0 | 34 | 1140 | 165 | 1889 | 799 | 15 | 778 | 151 | 17 | ... | 17 | 570 | 46 | 60 | 855 | 806 | 8 | 41 | 208 | 206 |
| 49.0 | 0 | 38 | 1114 | 214 | 1823 | 743 | 19 | 595 | 67 | 29 | ... | 20 | 579 | 4 | 47 | 930 | 799 | 5 | 67 | 236 | 189 |
41 rows × 34 columns
In [171]:
px.bar(CS,title= "Distribuição de Cs de todos os carros")
In [172]:
df.columns.to_list()
Out[172]:
['year', 'make', 'model', 'trim', 'body', 'transmission', 'state', 'condition', 'odometer', 'color', 'interior', 'seller', 'mmr', 'sellingprice', 'saledate']
In [173]:
Odometer = df['odometer'].value_counts()
In [174]:
CS = pd.crosstab(df["condition"],df["odometer"]).sum()
In [175]:
CS
Out[175]:
odometer
1.0 979
2.0 10
3.0 5
4.0 7
5.0 13
...
621388.0 1
694978.0 2
959276.0 1
980113.0 1
999999.0 60
Length: 160427, dtype: int64
In [176]:
px.line(CS,title= "Distribuição de CS de todos os carros")
In [177]:
COLOR = df['color'].value_counts()
In [178]:
COLOR
Out[178]:
color black 93244 white 89233 silver 71251 gray 70640 blue 43060 red 37336 — 22068 gold 9563 green 9234 burgundy 7605 beige 7444 brown 5621 orange 1763 purple 1270 off-white 1244 yellow 1079 charcoal 429 turquoise 190 pink 39 lime 12 Name: count, dtype: int64
In [179]:
px.bar(COLOR,title= "Distribuição de Cor Exterior de todos os carros")
In [180]:
MC = pd.crosstab(df['make'],df["color"])
In [181]:
MC
Out[181]:
| color | beige | black | blue | brown | burgundy | charcoal | gold | gray | green | lime | off-white | orange | pink | purple | red | silver | turquoise | white | yellow | — |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| make | ||||||||||||||||||||
| Acura | 33 | 1253 | 281 | 15 | 106 | 4 | 94 | 967 | 72 | 0 | 28 | 2 | 0 | 8 | 148 | 726 | 2 | 711 | 0 | 53 |
| Aston Martin | 0 | 11 | 2 | 0 | 0 | 1 | 0 | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 1 |
| Audi | 36 | 1490 | 445 | 35 | 2 | 4 | 36 | 742 | 23 | 0 | 2 | 0 | 0 | 4 | 143 | 703 | 1 | 915 | 3 | 218 |
| BMW | 152 | 5423 | 1734 | 211 | 55 | 6 | 134 | 3380 | 205 | 0 | 3 | 13 | 1 | 10 | 506 | 1825 | 5 | 3389 | 1 | 456 |
| Bentley | 0 | 32 | 22 | 2 | 0 | 1 | 0 | 19 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 13 | 0 | 10 | 0 | 2 |
| Buick | 270 | 546 | 292 | 196 | 122 | 3 | 286 | 493 | 79 | 0 | 39 | 0 | 0 | 22 | 256 | 887 | 2 | 910 | 0 | 67 |
| Cadillac | 306 | 1951 | 372 | 89 | 82 | 9 | 306 | 482 | 66 | 0 | 141 | 0 | 14 | 27 | 356 | 912 | 3 | 1135 | 0 | 64 |
| Chevrolet | 1479 | 8977 | 4865 | 579 | 810 | 66 | 1506 | 6819 | 696 | 1 | 52 | 264 | 12 | 138 | 4832 | 9774 | 22 | 12073 | 278 | 907 |
| Chrysler | 301 | 2649 | 1971 | 104 | 315 | 31 | 581 | 1995 | 200 | 0 | 92 | 7 | 0 | 49 | 1390 | 2511 | 11 | 2630 | 8 | 288 |
| Daewoo | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Dodge | 297 | 4855 | 3273 | 93 | 271 | 71 | 697 | 3899 | 193 | 1 | 28 | 383 | 1 | 46 | 3099 | 4170 | 7 | 5161 | 85 | 551 |
| FIAT | 8 | 130 | 42 | 16 | 0 | 9 | 0 | 98 | 67 | 0 | 1 | 12 | 0 | 0 | 112 | 64 | 0 | 130 | 21 | 15 |
| Ferrari | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 1 | 0 | 1 | 2 | 2 |
| Fisker | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 0 |
| Ford | 608 | 13416 | 6311 | 1041 | 1313 | 37 | 792 | 9528 | 1871 | 3 | 190 | 237 | 4 | 100 | 8194 | 9354 | 36 | 18262 | 131 | 9585 |
| GMC | 264 | 2032 | 650 | 130 | 181 | 18 | 221 | 1044 | 115 | 0 | 35 | 6 | 0 | 28 | 598 | 1167 | 6 | 2608 | 6 | 108 |
| Geo | 0 | 0 | 3 | 0 | 0 | 0 | 2 | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 5 | 0 | 0 | 2 | 0 | 0 |
| HUMMER | 22 | 215 | 89 | 1 | 17 | 0 | 14 | 86 | 22 | 0 | 0 | 25 | 0 | 0 | 69 | 49 | 1 | 104 | 41 | 13 |
| Honda | 340 | 4838 | 2861 | 410 | 748 | 9 | 611 | 5500 | 663 | 0 | 20 | 59 | 0 | 71 | 1074 | 4324 | 12 | 2935 | 6 | 300 |
| Hyundai | 251 | 3156 | 2775 | 374 | 252 | 9 | 305 | 2814 | 221 | 1 | 23 | 64 | 0 | 34 | 1776 | 3561 | 6 | 2678 | 29 | 330 |
| Infiniti | 53 | 3885 | 1098 | 54 | 76 | 1 | 85 | 3852 | 75 | 0 | 35 | 6 | 0 | 58 | 229 | 1073 | 1 | 2600 | 0 | 830 |
| Isuzu | 8 | 30 | 14 | 1 | 5 | 0 | 3 | 17 | 20 | 0 | 0 | 1 | 0 | 0 | 20 | 32 | 0 | 27 | 0 | 1 |
| Jaguar | 25 | 325 | 158 | 3 | 22 | 1 | 50 | 153 | 93 | 0 | 0 | 0 | 0 | 3 | 86 | 177 | 0 | 142 | 0 | 19 |
| Jeep | 217 | 2903 | 1113 | 123 | 278 | 20 | 232 | 1506 | 591 | 2 | 5 | 185 | 0 | 9 | 1243 | 2057 | 3 | 1907 | 94 | 247 |
| Kia | 216 | 2846 | 1051 | 575 | 588 | 3 | 206 | 2319 | 517 | 1 | 49 | 77 | 0 | 107 | 1487 | 2691 | 2 | 2762 | 79 | 252 |
| Lamborghini | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 | 0 |
| Land Rover | 7 | 498 | 60 | 24 | 5 | 1 | 16 | 173 | 32 | 0 | 1 | 5 | 0 | 3 | 39 | 111 | 1 | 321 | 4 | 21 |
| Lexus | 181 | 2602 | 689 | 138 | 132 | 47 | 412 | 1534 | 137 | 0 | 110 | 1 | 1 | 20 | 501 | 1732 | 1 | 1795 | 5 | 371 |
| Lincoln | 137 | 1162 | 209 | 85 | 113 | 1 | 95 | 389 | 57 | 0 | 92 | 3 | 0 | 12 | 365 | 531 | 0 | 838 | 0 | 837 |
| Lotus | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| MINI | 27 | 242 | 608 | 80 | 9 | 2 | 23 | 302 | 202 | 0 | 48 | 63 | 0 | 10 | 464 | 192 | 1 | 565 | 52 | 56 |
| Maserati | 0 | 35 | 11 | 3 | 6 | 1 | 0 | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 19 | 0 | 17 | 0 | 1 |
| Mazda | 49 | 1422 | 718 | 21 | 85 | 10 | 46 | 1755 | 231 | 0 | 4 | 15 | 0 | 33 | 653 | 876 | 13 | 895 | 7 | 97 |
| Mercedes-Benz | 126 | 4706 | 635 | 28 | 84 | 9 | 227 | 1588 | 70 | 0 | 6 | 1 | 0 | 17 | 402 | 2203 | 0 | 2572 | 3 | 1331 |
| Mercury | 101 | 225 | 151 | 16 | 47 | 1 | 107 | 155 | 104 | 0 | 7 | 1 | 0 | 1 | 121 | 275 | 0 | 227 | 0 | 32 |
| Mitsubishi | 96 | 632 | 342 | 33 | 45 | 0 | 27 | 532 | 76 | 0 | 7 | 70 | 4 | 6 | 350 | 627 | 2 | 716 | 3 | 93 |
| Nissan | 680 | 9282 | 3336 | 542 | 934 | 11 | 556 | 7151 | 374 | 0 | 123 | 53 | 1 | 306 | 2928 | 6417 | 9 | 8562 | 43 | 2735 |
| Oldsmobile | 24 | 23 | 19 | 11 | 9 | 0 | 30 | 19 | 25 | 0 | 0 | 0 | 0 | 1 | 32 | 62 | 1 | 61 | 0 | 1 |
| Plymouth | 0 | 5 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 2 | 0 | 2 | 1 | 0 |
| Pontiac | 57 | 621 | 447 | 31 | 115 | 2 | 118 | 515 | 128 | 0 | 1 | 17 | 0 | 13 | 527 | 735 | 8 | 626 | 13 | 44 |
| Porsche | 4 | 348 | 115 | 26 | 15 | 2 | 12 | 160 | 16 | 0 | 1 | 4 | 0 | 2 | 64 | 180 | 0 | 186 | 15 | 7 |
| Ram | 5 | 797 | 247 | 80 | 62 | 3 | 26 | 389 | 36 | 0 | 0 | 25 | 0 | 4 | 506 | 484 | 0 | 1312 | 1 | 67 |
| Rolls-Royce | 0 | 4 | 1 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 1 |
| Saab | 10 | 118 | 49 | 8 | 6 | 0 | 1 | 74 | 12 | 0 | 2 | 0 | 0 | 0 | 18 | 86 | 0 | 27 | 0 | 5 |
| Saturn | 71 | 414 | 394 | 34 | 59 | 1 | 118 | 279 | 133 | 1 | 0 | 19 | 0 | 16 | 243 | 495 | 2 | 270 | 6 | 15 |
| Scion | 4 | 277 | 211 | 16 | 61 | 2 | 3 | 259 | 19 | 0 | 6 | 39 | 0 | 61 | 80 | 197 | 0 | 263 | 8 | 39 |
| Subaru | 69 | 432 | 869 | 40 | 90 | 4 | 73 | 707 | 278 | 0 | 13 | 24 | 0 | 6 | 327 | 661 | 3 | 689 | 2 | 82 |
| Suzuki | 51 | 153 | 121 | 7 | 33 | 0 | 50 | 99 | 10 | 0 | 6 | 12 | 0 | 1 | 114 | 187 | 0 | 129 | 0 | 4 |
| Tesla | 0 | 5 | 1 | 0 | 0 | 0 | 0 | 8 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 0 | 3 | 0 | 1 |
| Toyota | 700 | 5351 | 2977 | 254 | 329 | 25 | 1319 | 6225 | 1229 | 0 | 42 | 41 | 1 | 34 | 3157 | 6807 | 27 | 5877 | 22 | 896 |
| Volkswagen | 103 | 2146 | 1052 | 67 | 95 | 3 | 88 | 2117 | 176 | 2 | 29 | 19 | 0 | 9 | 629 | 1583 | 1 | 1606 | 97 | 958 |
| Volvo | 55 | 707 | 337 | 24 | 26 | 1 | 55 | 445 | 84 | 0 | 2 | 4 | 0 | 0 | 130 | 654 | 1 | 500 | 1 | 56 |
| smart | 1 | 68 | 38 | 1 | 0 | 0 | 0 | 21 | 9 | 0 | 0 | 4 | 0 | 0 | 49 | 57 | 0 | 76 | 10 | 9 |
In [182]:
px.bar(MC,title= "Distribuição de Cor Exterior de todos os carros")
In [183]:
Interior = df['interior'].value_counts()
In [184]:
Interior
Out[184]:
interior black 209865 gray 152616 beige 50469 tan 37723 — 9864 brown 7102 red 1128 silver 1002 blue 917 off-white 355 gold 288 purple 287 white 212 green 200 burgundy 154 orange 124 yellow 19 Name: count, dtype: int64
In [185]:
px.bar(Interior,title= "Distribuição de Cor Interior de todos os carros")
In [186]:
Seller = df['seller'].value_counts()
In [187]:
Seller
Out[187]:
seller
ford motor credit company llc 17756
the hertz corporation 16286
nissan-infiniti lt 15777
santander consumer 14245
avis corporation 11471
...
nu image auto llc 1
derek scott's auto park 1
maserati north america inc 1
alternative financial group inc 1
lienemann auto sales inc 1
Name: count, Length: 11923, dtype: int64
In [188]:
Seller = Seller[Seller > 1000]
Seller
Out[188]:
seller
ford motor credit company llc 17756
the hertz corporation 16286
nissan-infiniti lt 15777
santander consumer 14245
avis corporation 11471
...
td auto finance 1137
kevins marysville auto sales inc 1099
ars/enterprise 1084
oriental bank 1044
pv holding inc/gdp 1032
Name: count, Length: 62, dtype: int64
In [189]:
px.line(Seller,title= "Distribuição das entidades vendedoras dos veículos (todos os carros)")
In [190]:
SMM = pd.crosstab(df['seller'],df["make"]).sum()
In [191]:
SMM
Out[191]:
make Acura 4503 Aston Martin 24 Audi 4802 BMW 17509 Bentley 105 Buick 4470 Cadillac 6315 Chevrolet 54150 Chrysler 15133 Daewoo 2 Dodge 27181 FIAT 725 Ferrari 17 Fisker 9 Ford 81013 GMC 9217 Geo 16 HUMMER 768 Honda 24781 Hyundai 18659 Infiniti 14011 Isuzu 179 Jaguar 1257 Jeep 12735 Kia 15828 Lamborghini 3 Land Rover 1322 Lexus 10409 Lincoln 4926 Lotus 1 MINI 2946 Maserati 109 Mazda 6930 Mercedes-Benz 14008 Mercury 1571 Mitsubishi 3661 Nissan 44043 Oldsmobile 318 Plymouth 16 Pontiac 4018 Porsche 1157 Ram 4044 Rolls-Royce 16 Saab 416 Saturn 2570 Scion 1545 Subaru 4369 Suzuki 977 Tesla 23 Toyota 35313 Volkswagen 10780 Volvo 3082 smart 343 dtype: int64
In [192]:
px.bar(SMM,title= "Distribuição das entidades responsáveis pela venda dos veículos em todos os carros")
In [193]:
df.columns.to_list()
Out[193]:
['year', 'make', 'model', 'trim', 'body', 'transmission', 'state', 'condition', 'odometer', 'color', 'interior', 'seller', 'mmr', 'sellingprice', 'saledate']
In [194]:
df['profit_or_loss'] = df['mmr'] - df['sellingprice']
In [195]:
df['profit_or_loss'].describe()
Out[195]:
count 472325.000000 mean 146.546905 std 1741.175011 min -207200.000000 25% -650.000000 50% 50.000000 75% 800.000000 max 87750.000000 Name: profit_or_loss, dtype: float64
In [196]:
PL = df['profit_or_loss'].value_counts()
In [197]:
px.line(PL,title= "Distribuição dos lucros ou prejuízos de todos os carros")